NSF-ITR/IM PROJECT: 2004 Abstracts From Bits to Information: Statistical Learning Technologies for Digital Information Management Search

نویسنده

  • Brian Leung
چکیده

Project Title: Term Informativeness PI: T. Jaakkola Participants: Jason Rennie and Tommi Jaakkola (MIT CSAIL) Abstract: Informal communication (e-mail, bulletin boards) poses a difficult learning environment because traditional grammatical and lexical information are noisy. For named entity extraction how topic-centric, or “informative,” a word is can provide valuable additional information. We introduce a new informativeness score based on mixture models for the task of extracting restaurant names from bulletin board posts. By combining the mixture score with IDF, we are able to achieve significant gains on a restaurant extraction task. We also motivate and discuss a Bayesian version of the score which would better capture the variability in term occurrence rates.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

NSF-ITR/IM PROJECT: 2001 Abstracts From Bits to Information: Statistical Learning Technologies for Digital Information Management Search

Project Title: Polycategorical Categorization for Personalized Information Filtering PI: T. Hofmann Participants: Ioannis Tsochandaritis and Thomas Hofmann Abstract: Polycategorical categorization is an extension of standard classification in which items are labeled by multiple binary labels. We are particularly interested in cases with large numbers of overlapping categories and a priori unkno...

متن کامل

NSF-ITR/IM PROJECT: 2002 Abstracts From Bits to Information: Statistical Learning Technologies for Digital Information Management Search

Project Title: Support Vector Machines for Multiple Instance Learning PI: T. Hofmann Participants: Stuart Andrews and Thomas Hofmann Abstract: Multiple Instance Learning (MIL) is an important generalization of standard supervised binary classification. In MIL labels are not available for individual training patterns, but are associated with sets of patterns, which introduces additional uncertai...

متن کامل

NSF-ITR/IM PROJECT From Bits to Information: Statistical Learning Technologies for Digital Information Management Search

Project Title: Polycategorical Categorization for Personalized Information Filtering PI: T. Hofmann Participants: Ioannis Tsochandaritis and Thomas Hofmann Abstract: Polycategorical categorization is an extension of standard classification in which items are labeled by multiple binary labels. We are particularly interested in cases with large numbers of overlapping categories and a priori unkno...

متن کامل

NSF-ITR/IM PROJECT From Bits to Information: Statistical Learning Technologies for Digital Information Management Search

Project Title: Polycategorical Categorization for Personalized Information Filtering PI: T. Hofmann Participants: Ioannis Tsochandaritis and Thomas Hofmann Abstract: Polycategorical categorization is an extension of standard classification in which items are labeled by multiple binary labels. We are particularly interested in cases with large numbers of overlapping categories and a priori unkno...

متن کامل

کاربرد رایانه‌های جیبی و تلفن‌های هوشمند در دسترسی به اطلاعات سلامت

Background and Aim: Today, one of the challenges of doctors is how they can access medical information as quick as possible. Personal Digital Assistants (PDAs) and Smartphones are such information technologies that can be used to access health information. This study aimed to review the most important uses of Personal Digital Assistants and Smartphones in medicine and in accessing health inform...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004